Skip to content

Conversation

bahrmichael
Copy link
Collaborator

Fixes #13


Amp generated PR description:

When a WebSocket client became unresponsive and timed out, the ping timer handler caused a "handle is already closing" error by attempting to close the same TCP handle twice.

The race condition occurred in tcp.lua's ping timeout logic:

  1. close_client(client, 1006, ...) was called, which writes a Close frame and closes the handle in its write callback
  2. _remove_client(server, client) was called immediately after, which also closed the handle if not is_closing()

Since the write callback hadn't fired yet, the handle appeared not to be closing, so both code paths attempted to close it.

Changes made:

  1. Ping timeout handler (tcp.lua): Removed the call to close_client() for dead clients. Now only calls _remove_client() and on_disconnect() to drop the connection without sending a Close frame. This is correct per RFC 6455, which states code 1006 MUST NOT be sent in a Close frame.

  2. close_client() hardening (client.lua): Made fully defensive:

    • Checks is_closing() before any close operation
    • Never sends code 1006 in Close frame (RFC 6455 compliance)
    • Fixed state transitions: sets "closing" before I/O, "closed" after
    • Guards all close() calls to prevent double-close races
  3. _remove_client() hardening (tcp.lua): Added explicit state management and a comment explaining that close() is async but marking "closed" immediately is safe since the client is removed from the active list.

Why this implementation is safe:

  • No code path distinguishes between "closing" and "closed" states
  • All state checks only care if client.state == "connected" or not
  • Once state is != "connected", the client won't receive new operations
  • The is_closing() guards prevent any double-close attempts
  • Marking state as "closed" immediately after close() is semantically accurate since the client is already removed from server.clients

Amp-Thread-ID: https://ampcode.com/threads/T-eb322254-0c96-4330-ae76-53b06d81f3a3

Fixes #13

When a WebSocket client became unresponsive and timed out, the ping
timer handler caused a "handle is already closing" error by attempting
to close the same TCP handle twice.

The race condition occurred in tcp.lua's ping timeout logic:
1. close_client(client, 1006, ...) was called, which writes a Close
   frame and closes the handle in its write callback
2. _remove_client(server, client) was called immediately after, which
   also closed the handle if not is_closing()

Since the write callback hadn't fired yet, the handle appeared not to
be closing, so both code paths attempted to close it.

Changes made:

1. Ping timeout handler (tcp.lua): Removed the call to close_client()
   for dead clients. Now only calls _remove_client() and on_disconnect()
   to drop the connection without sending a Close frame. This is correct
   per RFC 6455, which states code 1006 MUST NOT be sent in a Close frame.

2. close_client() hardening (client.lua): Made fully defensive:
   - Checks is_closing() before any close operation
   - Never sends code 1006 in Close frame (RFC 6455 compliance)
   - Fixed state transitions: sets "closing" before I/O, "closed" after
   - Guards all close() calls to prevent double-close races

3. _remove_client() hardening (tcp.lua): Added explicit state management
   and a comment explaining that close() is async but marking "closed"
   immediately is safe since the client is removed from the active list.

Why this implementation is safe:

- No code path distinguishes between "closing" and "closed" states
- All state checks only care if client.state == "connected" or not
- Once state is != "connected", the client won't receive new operations
- The is_closing() guards prevent any double-close attempts
- Marking state as "closed" immediately after close() is semantically
  accurate since the client is already removed from server.clients

Amp-Thread-ID: https://ampcode.com/threads/T-eb322254-0c96-4330-ae76-53b06d81f3a3
Co-authored-by: Amp <[email protected]>
@bahrmichael bahrmichael requested a review from burmudar October 2, 2025 15:19
@burmudar
Copy link
Contributor

burmudar commented Oct 7, 2025

@bahrmichael forgot about this - will get to it by end of week!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error when client has died
2 participants